Lightning Fast and Space Efficient Inequality Joins

نویسندگان

  • Zuhair Khayyat
  • William Lucia
  • Meghna Singh
  • Mourad Ouzzani
  • Paolo Papotti
  • Jorge-Arnulfo Quiané-Ruiz
  • Nan Tang
  • Panos Kalnis
چکیده

Inequality joins, which join relational tables on inequality conditions, are used in various applications. While there have been a wide range of optimization methods for joins in database systems, from algorithms such as sort-merge join and band join, to various indices such as B-tree, R⇤-tree and Bitmap, inequality joins have received little attention and queries containing such joins are usually very slow. In this paper, we introduce fast inequality join algorithms. We put columns to be joined in sorted arrays and we use permutation arrays to encode positions of tuples in one sorted array w.r.t. the other sorted array. In contrast to sort-merge join, we use space e cient bit-arrays that enable optimizations, such as Bloom filter indices, for fast computation of the join results. We have implemented a centralized version of these algorithms on top of PostgreSQL, and a distributed version on top of Spark SQL. We have compared against well known optimization techniques for inequality joins and show that our solution is more scalable and several orders of magnitude faster. 1. ONCE UPON A TIME . . . Bob, a data analyst working for an international provider of cloud services, wanted to analyze revenue and utilization trends from di↵erent regions. In particular, he wanted to find out all those transactions from the West-Coast that last longer and produce smaller revenues than any transaction in the East-Coast. In other words, he was looking for any customer from the West-Coast who rented a virtual machine for more hours than any customer from the East-Coast, but who paid less. Figure 1 illustrates a data instance for both tables. He wrote the following join query for such a task:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Errata for "Lightning Fast and Space Efficient Inequality Joins" (PVLDB 8(13): 2074-2085)

This is in response to recent feedback from some readers, which requires some clarifications regarding our IEJoin algorithm published in [1]. The feedback revolves around four points: (1) a typo in our illustrating example of the join process; (2) a naming error for the index used by our algorithm to improve the bit array scan; (3) the sort order used in our algorithms; and (4) a missing explan...

متن کامل

VHF Lightning Observations by Digital Interferometry from ISS / JEM-GLIMS

Global Lightning and sprIte MeasurementS (GLIMS) mission is now ongoing on Exposed Facility of Japanese Experiment Module (JEM-EF) of the International Space Station (ISS). This paper focuses on an electromagnetic (EM) payload of JEM-GLIMS mission, very high frequency (VHF) broadband digital InTerFerometer (VITF). JEM-GLIMS mission is designed to conduct comprehensive observations with both EM ...

متن کامل

Application of Intelligent Water Drops in Transient Analysis of Single Conductor Overhead Lines Terminated to Grid-Grounded Arrester under Direct Lightning Strikes

In this paper, Intelligent water drop algorithm (IWD) is used to analyze single overhead line connected to grid-grounded arrester. In this approach, at first Norton’s equivalent circuit of the overhead line over lossy soil is computed by method of moments (MoM) and then for the problem under consideration, a nonlinear equivalent circuit in the frequency domain is proposed. Finally applying inte...

متن کامل

Efficient Evaluation of the Valid-Time Natural Join

Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the optimization of joins with equality predicates, rather than the inequality predicates prevalent in valid-time queries. Seco...

متن کامل

Processing Inequality Queries

Bernstein and Goodman showed that natural inequality ( NI) queries can be processed efficiently by semijoins, if there are no multiple inequality join edges, nor cycles with one or zero doublet. In this paper procedures to hand1 e these cases efficiently are given. Multiple inequality join edges can be processed by multi-attribute inequality semijoins. Two procedures based on generalized semi-j...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2015